Accurate prediction of drug combination risk levels based on relational graph convolutional network and multi-head attention

Background Accurately identifying the risk level of drug combinations is of great significance in investigating the mechanisms of combination medication and adverse reactions. Most existing methods can only predict whether there is an interaction between two drugs, but cannot directly determine their accurate risk level. Methods In this study, we propose a multi-class drug combination risk prediction model named AERGCN-DDI, utilizing a relational graph convolutional network with a multi-head attention mechanism. Drug-drug interaction events with varying risk levels are modeled as a heterogeneous information graph. Attribute features of drug nodes and links are learned based on compound chemical structure information. Finally, the AERGCN-DDI model is proposed to predict drug combination risk level based on heterogenous graph neural network and multi-head attention modules. Results To evaluate the effectiveness of the proposed method, five-fold cross-validation and ablation study were conducted. Furthermore, we compared its predictive performance with baseline models and other state-of-the-art methods on two benchmark datasets. Empirical studies demonstrated the superior performances of AERGCN-DDI. Conclusions AERGCN-DDI emerges as a valuable tool for predicting the risk levels of drug combinations, thereby aiding in clinical medication decision-making, mitigating severe drug side effects, and enhancing patient clinical prognosis.


Background
Human disease is a major obstacle to human health.Because of the complexity of the disease and the multiple benefits of combination therapy, combination therapy [1] is often used in the treatment of human diseases.For example, multi-drug therapy can reduce the dosage of drugs and improve the therapeutic effect.However, it has been proven that when we take two different drugs at the same time, it may lead to drug effects that do not belong to either of these drugs, that is, drug-drug interactions (DDIs).In recent years, prediction of DDIs has become an important research topic in the field of bioinformatics.Zwart et al. found that 28% of all hospitalized patients had at least one potential DDI, with a 1.4% incidence of contraindicated or life-threatening interactions [2].Mousavi et al. found that the most common type of interaction observed was type C (78.6%), and that this type of interaction does not cause any serious and fatal consequences, meanwhile, 9.2% of patients had type X interactions, which can be harmful and life-threatening [3].Therefore, there is a practical need to identify the exact risk levels of interaction between drugs.Traditional in vitro and in vivo experiments are time-consuming and labour-intensive [4,5].Before the advent of highthroughput technologies [6,7], one experiment can only detect one single kind of drug-drug interactions.With abundant types of medications available, it is difficult for researchers to one-by-one identifies DDIs through this way, which limits the effectiveness of DDI risk identification.Therefore, computational methods have gained more attention by establishing algorithmic models to predict possible DDI events.These methods are roughly divided into three categories: matrix-based methods, deep learning-based methods, and graph-based methods.
Matrix-based methods typically incorporate background information about a drug into a matrix decomposition, and then similarly calculate drugto-drug interaction events.Zhang et al. proposed a manifold regularization matrix factorization-based method to predict potential drug interaction events, named MRMF.Manifold regularization based on drug characteristics is introduced into matrix decomposition [8].Zhang et al. use sparse feature learning (SFL) method to project multiple drug features into a common latent (approximate) interaction matrix, and linear neighbourhoods regularization (LNR) based on known drug interaction is introduced to predict DDI events [9].Yu et al. designed a novel model (DDINMF) for DDI prediction based on Nonnegative Matrix Factorization (NMF) [10].Shi et al. developed a unified framework based on three matrix factorization (TMFUF) for predicting DDI events using the side effects of drugs [11].One issue arises which is that the merging of node domain characteristics cannot be achieved through matrix-based methods.
Over the past few years, deep learning approaches have yielded outstanding results and significant progress in many fields [12][13][14].Karim et al. proposed to use CNN and LSTM to predict DDI events [15].Shukla et al. propose the integration of convolutional neural network, recurrent neural network and hybrid density networks to predict DDI events [16].Chen et al. introduce a twolayer architecture, including cross-over (based on CNN) and scalar-level modules that can combine internal and external functionality from different granularities [17].Yi et al. proposed a recurrent neural network model featuring multiple attention layers [18].The deep learning-based methods are more used in Euclidean space data, which is not entirely applicable to drug networks.
On this basis, graph-based model is more suitable for non-Euclidean space data.Arnold K. Nyamabo et al. propose a message-passing neural network in which edges have learnable weights and study molecular structures to predict DDI events [19].Lin et al. propose a knowledge graph neural network (KGNN), an end-toend framework, which introduces a knowledge graph to predict DDI events by exploring topologies of drugs in the knowledge graph [20].Feng et al. introduce a deep predictor of drug-drug interactions (DPDDI), which uses graph convolution networks (GCN) to learn lowdimensional feature representations and uses a deep neural network (DNN) to train the model [21].Yu et al. propose a SumGNN method consisting of different submodules to obtain better aggregate information and perform multi-category prediction [22].Wang et al. proposed a multi-view graphical learning drug embed by designing an end-to-end framework called MIRACLE that included a key-aware messaging network and a GCN encoder [23].Ma et al. proposed using graphic autoencoders to model heterogeneous correlations between different views and target tasks, and adding attention mechanisms to improve interpretability [24].
Recent research has made significant progress in predicting drug-drug interaction events.Systematic reviews reveal the critical role of computational methods in providing support for judicious drug repurposing, extensively applied in the investigation of viral cancers, psoriasis, COVID-19, and specific cancer types such as HPV-related cervical and endometrial cancers [30][31][32][33][34][35][36].Nonetheless, most of these methods still present several limitations.Firstly, they often require accumulating comprehensive and diverse drug attribute information, which can be burdensome for newly emerged drug model prediction.Secondly, the behavioural characteristics of drug nodes in complex network structures are typically underutilized.Most computational models only consider the attributes of drugs themselves, which are employed for simple classification tasks.Thirdly, most existing methods solely aim to predict whether there are adverse effects among proved drug pairs, ignoring the classification of risk levels within different drug combinations.However, it is especially crucial to properly classify levels of risk associated with drug combinations to assist medical staff in making informed drug recommendations.
In this study, we propose a relational graph convolutional network and multi-head attention-based method to predict risk levels of drug combinations, called AER-GCN-DDI.The workflow of AERGCN-DDI is shown in Fig. 1.More specifically, a heterogeneous information graph is constructed by treating drugs as nodes, different risk levels of drug-drug interaction events as edges.Subsequently, the molecule fingerprint generated by the RDKit [37] tool is utilized as node features, and link features are obtained by connecting the features of nodes on both sides.Then, principal component analysis (PCA) is employed to reduce the dimension of the primary attribute features.Finally, a heterogenous graph neural network with multi-head attention modules is proposed to predict DDI events.AERGCN-DDI is tested to predict the combination risk of both approved drugs and newly emerged drug compounds.To evaluate the effectiveness of the proposed method, five-fold cross-validation and ablation study were further conducted.Experimental results demonstrated that AERGCN-DDI can serve as a useful tool for predicting the risk levels of drug combinations, which can help guide clinical medication decisions, reduce serious drug side effects, and improve patient clinical prognosis.

Benchmark datasets
A hierarchical multi-class drug combination dataset was constructed based on the DDinter [38], which contains about 0.24M DDI associations among 1833 approved drugs.Each drug is annotated with basic chemical and pharmacological information and its interaction network.Abundant professional annotations are provided for DDI entries, including severity, mechanism description, strategies for managing potential side effects, alternative medications, etc.The drugs that were unable to obtain compound SMILES descriptors were removed, 1634 drug nodes were ultimately obtained.
The risk level of drug interactions is labeled by senior pharmacists and divided into four levels, including Major, Moderate, Minor, and Unknown.Major represents life-threatening interactions requiring medical intervention, Moderate indicates the interactions that causes disease exacerbation or therapy change, Minor means the interactions that limits clinical effects, usually not requiring therapy changes.DDIs lacking mechanism descriptions were classified as 'Unknown' .Finally, we obtained 221,132 DDI events, of which 47,182 were unknown events, 10,861 were minor events, 129,472 were moderate events, and 33,617 were major events, as shown in Fig. 2.
The second dataset we used was a large-scale drugdrug event dataset constructed by Deng et al. [39] from DrugBank [40], including 572 drugs and 37,264 pair-wise DDIs with DDI types classified into 65 categories.The percentages of all events for this dataset are shown in Fig. 3.

Construction of heterogeneous information graph
The drug-drug interaction events with different level of risks can be modeled as a heterogeneous information graph, where each node represents a drug, and edges Fig. 1 The workflow of the proposed AERGCN-DDI method represent different risk levels between drug nodes.Formally, a drug-drug risk rating matrix can be defined as Y ∈ (0,1, 2,3) |N d |×|N d | , where |N d | denotes the number of drugs.In the matrix, for each entry y i,j = X(i, j ∈ N d , i � = j),where X number represents a different risk rating coefficient, and the higher the number ofX , the higher the risk rating.
In alternative terminology, we can restructure the graph representation of n-array facts from n-array F = ((s, r, o), {(a i : v v )}(i = 1) m ) as a heterogeneous graph G = (V , E) .Graphs also are referred to as networks, which assigns nodes to vertices and relationships to E .In DDI risk level networks, there are four types of undirected edges between vertices.The vertex set V contains all entities, resulting in

Leveraging molecular fingerprints for drug attribute learning
In order to minimize the reliance on a substantial amount of attribute information, only molecular fingerprint sequences will be employed as drug features.This facilitated the development of lightweight and user-friendly models that align with the practical context of lacking detailed information in the initial stages of new drug development.It is generally assumed that the physical and chemical properties of compounds with similar structures are similar, and similar assumptions are made about their biological activities.This criterion is called Johnson and Maggiora's Law of similarity [21], this is also the basis for computer-aided risk assessment of drug combinations.Molecular fingerprint is a numerical method that can effectively describe the structural information of drug compounds.Previous studies have shown that molecular fingerprints can effectively express the molecular structure of drug compounds.Therefore, we use RDKit [41] to encode of SMILES sequences into Morgan fingerprints as attribute features of drug nodes.In the DDIs link prediction task, the attributes of two different drug nodes with interactive events are concatenated together as edge attributes and input into the model as the first part of the input.And the entire DDI matrix is used as the second part of the input to extract the topology domain information of the DDIs graph.
Furthermore, in order to assess the impact of molecular fingerprint features of different dimensions on the prediction performance, PCA was used to downscale the attribute features into different dimensions.PCA is a widely used dimensionality reduction method.Its main idea is to map n-dimensional features to k-dimension, which is a new orthogonal feature also called principal component.where P is a matrix of N * K , which is made up of the column vectors of K , and when K is less than n , it is dimensionless [42].

Enhancing drug combination risk prediction with relational graph convolutional networks
In this section, we introduce a double-layer relational graph convolutional network (RGCN) [43] tailored to capture intricate topology information within DDI graphs.RGCN extends the capabilities of conventional Graph Convolutional Networks (GCNs) by discerning the characteristics of individual relationship types and assigning distinct weight matrices accordingly.Unlike GCNs, RGCNs excel in managing heterogeneous graphs, making them well-suited for DDI networks [44].The constructed DDI network encompasses four types of edges, with varying weights assigned during model training.The process of updating each node's representation in RGCN involves aggregating information from neighboring nodes.This mechanism enables nodes to glean insights into their topological context while preserving their distinctive characteristics.The propagation model is as follows: Here, x 1,j and x 2,j are the corresponding components of the feature vectors of node i and j .Equation (3) utilizes a double-loop traversal to integrate features from adjacent nodes, thereby fusing them while traversing existing relationships.The output feature of the central node is produced by adding its feature to the aggregated features and applying activation functions.To mitigate overfitting of rare relationships, we introduce two separate methods for regularizing the weights of RGCN layers: Basis-decomposition: where rb such that only the coefficients depend on r. Block-diagonal-decomposition: where r consists of block-diagonal matrices, with each (1) contributing to diagonal blocks.ForB = d , each Q has dimension 1, resulting in W r becoming a diagonal matrix.AERGCN-DDI utilized basis-decomposition and have designated the num-bases as multiples of drug pairs risk levels.

Leveraging multi-head attention for drug interaction prediction
Prediction of newly emerged drugs differs from proved drugs because the former lack interaction information, necessitating models with superior field aggregation capability and stronger predictive performance.This inconsistency prompted us to explore the multi-head self-attention mechanism of transformers as a broad and potent approach to encode knowledge graphs and address the challenge of link prediction.
The update method of the multi-headed attention mechanism is as follows: where x 1 and x 2 are the original feature vectors of two nodes [45].
In our research, we consider unknown relationships as one type of interaction between drugs.After aggregating node and edge features, we generate a set of embedding vectors Z for the predicted edges.We apply multi-head attention mechanism to the latent representation sequence Z and then score the different types of edges in the classification task.The calculation formula for layer normalization is as follows:

The implementation of the AERGCN-DDI model
The AERGCN-DDI model utilizes a multilayer message-passing mechanism to capture high-order neighboring information.To enhance the prediction of potential DDI events (link prediction), we recalculated the information of nodes and edges.Specifically, the features of edges were generated by combining the features of the edge with those of its two adjacent nodes.( 6) The entire model can be descripted as Algorithm 1 below.
Let G(v, ε) be a graph with nodes v and edges ε .The feature for node v , and edge (u, e, v) 2 are represented by x v ∈ R d 1 and w e ∈ R d 2 , respectively.At step t + 1 , the message passing paradigm encompasses node-wise and edge-wise computation [46]: Here, ∅ is a message function defined on each edge; The function ψ updates node features by aggregating incoming messages through the reduce function ρ.Twolayer RGCN and a multi-head self-attention mechanism are employed to better integrate different types of neighborhood information and capture network structure.Additionally, we utilize AdamW optimizer [47] to train (9) Edge − wise : Node − wise : e : (u, e, v) ∈ ε} the models by optimizing the cross entropy loss function.
The formula of cross-entropy loss is shown as: where y is the object, y is the probability of being the object, and m is the number of objects.

Baseline methods
The graph model achieves network embeddedness by mapping high-dimensional graph data to lowdimensional vectors.To demonstrate the performance and robustness of the proposed AERGCN-DDI, we benchmark a variety of state-of-the-art GNN models, including GCN [48], GAT and GraphSAGE [49], which rely on local domain aggregation of nodes and can be used for link prediction.GCN.The essential purpose of GCN is to extract spatial features of topological graphs.Meanwhile, GCN (11 is a type of neural network layer that operates through inter-layer propagation.
where A = A + I N ,I is the identity matrix.D is the degree matrix of A,while H is the hidden features of nodes l th layer.σ is an activation function that passes information from one layer to the next layer [44].
GAT. GAT utilizes a self-attention mechanism to aggregate neighbor nodes, achieving adaptive matching of weights for different neighbors and increasing model accuracy.To make coefficients easily comparable across different nodes, and normalize them across all choices of j using the softmax function: The attention mechanism is a feedforward neural network with a single layer.Its coefficients can be represented as: GraphSAGE.In the GraphSAGE algorithm, each node only samples a portion of its own neighbors to iteratively update its own features.GraphSAGE can use either unsupervised or supervised training.Unsupervised training uses a negative sampling algorithm with the following formula: Aggregators include: LSTM aggregator, mean aggregator, pooling aggregator, GCN convolution aggregator: LSTM aggregator: LSTM has better feature extraction capabilities, but because there is no obvious sequential relationship between nodes, it is shuffled into the LSTM.
Mean aggregator: when aggregating node V, compute the average of node V and domain eigenvectors: Pooling aggregator: In this way, the feature vectors of all the neighbor nodes are passed into a fully connected layer, and then max-pooling aggregation is used: DEML [50].Wang et al. proposed an ensemblebased multi-task neural network, for the simultaneous optimization of five synergy regression prediction tasks, synergy classification, and DDI classification tasks.DEML uses chemical and transcriptomics information as inputs.DEML adapts the novel hybrid ensemble layer structure to construct higher order representation using different perspectives.The task-specific fusion layer of DEML joins representations for each task using a gating mechanism.
DDIMDL [39].Deng et al. proposed a multimodal deep learning framework that combines diverse drug features with deep learning to build a model for predicting DDIassociated events.DDIMDL first constructs deep neural network (DNN)-based sub-models, respectively, using four types of drug features: chemical substructures, targets, enzymes and pathways, and then adopts a joint DNN framework to combine the sub-models to learn cross-modality representations of drug-drug pairs and predict DDI events.
DPSP [51].Masumshah et al. introduced a deep learning framework for predicting multiple drug side effects, divided into two steps.Firstly, it collects various drug information that may affect Drug-Drug Interactions (DDIs), such as individual drug side effects, targets, enzymes, chemical substructures, and pathways, to construct novel features.Then, predictions of 65, 100, and 185 categories of DDI events in DS1, DS2, and DS3 are executed through a deep multimodal framework.
GADNN [52].Nejati M et al. proposed a method to predict DDIs by considering the influence of different drug-related features.Their approach consists of two stages.In the first stage, four basic drug datasets are used to generate embedding vectors for each drug separately.Next, a new graph attention mechanism dynamically calculates the contribution coefficient of each dataset, and the weighted combination of these vectors is used to predict drug-drug interactions probability through a dense neural network.

Experiment setup and evaluation metrics
To evaluate the performance of the proposed method, five-fold cross-validation is first conducted.The whole benchmark dataset is randomly divided into five subsets, one-fold is employed as test set each time, while the remaining four sets are employed as training data, cycle five times and take the average result as final result.To accomplish the task of predicting DDIs between unknown (newly emerged) drugs, we adopt a new data partitioning method.We divided the dataset into two (17) major groups: confirmed (proved) drug categories and novel (newly emerged) drug categories.The latter refers to drugs that lack any prior data and thus, any relevant relationships were removed from the dataset.Based on the partitioned dataset, we divide the corresponding DDI dataset into between confirmed drug pairs, confirmed drug-novel drug pairs, and novel drug pairs.Our model is trained on confirmed drug pairs dataset and performs prediction tasks on confirmed drug pairs (Task 1), confirmed drug-novel drug pairs (Task 2), and novel drug pairs (Task 3), respectively.The final average results of these operations can explain the stability of the proposed model.Six indicators are adopted to measure the multi classification performance of the model, including accuracy (Acc), Area Under the Precision-Recall Curve (AUPR), Area Under the Receiver Operating Characteristic Curve (AUC), F1 score, Precision and Recall with AUPR and F1 are more sensitive to severe imbalances data.Micro metrics are used for AUPR and AUC, while macro metrics are used for other measurements.The definitions of these indicators can be described as follows: (18) Acc = TP + TN TN + TP + FN + FP where the TN, PN, FN and FP denote the number of correctly predicted positive and negative samples, wrongly predicted positive and negative samples, respectively.In addition, we use the Micro mode to calculate AUC and Recall, which treats each element of the label indicator matrix as a label.In contrast, F1 calculates each label in a Macro mode and finds their unweighted average.

Results and discussion
To evaluate the performance of the AERGCN model, we conducted extensive experiments on three tasks, comparing AERGCN with seven state-of-the-art methods under fivefold cross-validation.Tables 1, 2

Comparison of AERGCN-DDI and comparative methods on Task 1
To evaluate the effectiveness of our method for drugdrug interaction extraction in a hot-start environment (Task 1), we compared the comparative effectiveness of AERGCN with seven other state-of-the-art models.The experimental results are shown in Table 1.
From the experimental results, we conclude that the AERGCN-DDI model achieves the best performance in predicting proven drug-drug interaction events under warm-start conditions, and its performances on ACC, AUPR, AUC, F1, Precision, and Recall are 93.81%,90.1%, 96.15%, 91.48%, respectively, 93.17%, and 90.04%.Of these, ACC, F1, Precision, and Recall all achieved optimal performance, improving over the suboptimal methods by 2.79%, 4.33%, 3.53%, and 4.82%, respectively.To examine  the overall effectiveness of the various methods in more detail, we present in Fig. 4 the performance of all the baseline models for all the events in ACC, AUPR, AUC, F1, Precision, and Recall statistical boxplots.These results demonstrate the excellent performance of the AERGCN-DDI method in the task of drug interaction prediction.Relatively speaking, our proposed AERGCN-DDI model performs the best in predicting the interactions between proved drugs in terms of their effectiveness.

The performance of AERGCN-DDI on Task 2 and Task 3 under five-fold cross-validation
To validate the experimental performance of the proposed model in a cold-start environment, we simulated the scenario of new drug emergence and performed a five-fold cross-validation.In Task 2, we simulated the interaction prediction of old and new drugs, and in Task 3, we simulated the interaction prediction of new and new drugs.The complete and detailed experimental results are shown in Tables 2 and 3, while Figs. 5 and 6 provide a visual presentation of the relevant data.
In Task 2 and Task 3, AERGCN-DDI showed significant advantages in all evaluation metrics.In Task 2, it outperforms the suboptimal method by 14.01%, 16.79%, 7.07%, 17.9%, 14.78%, and 18.68% in terms of ACC, AUPR, AUC, F1, Precision, and Recall, respectively.In Task 3, AERGCN-DDI outperforms the suboptimal method by 15.37%, 22.69%, 11.94%, 22.83%, 28.25% and 20.08%.This indicates that AERGCN has stronger predictive ability and generalization when facing the scenario of emergence of new drugs, and is more suitable for potential relationship mining of unknown drugs, which provides strong support for further research and application in the field of drug interaction prediction.
To verify the effect of different embedding dimensions on the experimental results, we introduce PCA to generate 100, 150, 200, 250, and 300 dimensional feature dimensions and input them into the AERGCN-DDI Model (Task 1), the experiment results show that the 300-dimensional feature can obtain the best value.Figure 7 shows the results of AERGCN-DDI with various numbers of embedding dimensions, Notably, as we increase the number of embedding dimensions, the evaluation indicators of the training and testing sets steadily increase, so the feature dimension is set to 300.

Comparison of AERGCN-DDI and other state-of-the-art methods on the DrugBank dataset
To further validate the effectiveness of AERGCN-DDI in the multi-classification scenario of DDI events, we utilized DrugBank dataset which consists of 65 classes and is characterized by imbalanced data.To highlight the outstanding performance of our model, we compared AERGCN with the following state-of-theart DDI prediction methods.Of note, the data for our comparative models are derived from the experimental results presented in the MSEDDI article: DeepDDI [53] consists of SSP and DNN.It takes chemical structures and drug names as inputs and generates human-readable sentences that describe the DDI types.
Lee's method [54] proposed employs autoencoders and a deep feed-forward network, which are trained with SSP, GSP, and TSP of known drug pairs, to predict the pharmacological effects of DDIs.
DDIMDL [39] employs four drug features: chemical substructures, targets, enzymes, and pathways.It uses a joint DNN framework to combine the sub-models, learn cross-modality representations of drug pairs, and predict DDI events.
MDF-SA-DDI [55] combines two drugs in four different ways and inputs the resulting drug features into four different drug fusion networks (Siamese network, Fig. 6 The ACC, AUPR, AUC, F1, Precision and Recall of compared methods on Task 3 of the DDInter dataset convolutional neural network, and two autoencoders) to obtain potential feature vectors for drug pairs.Then, potential feature fusion is performed using self-attention mechanisms.
MSEDDI [56] designs three-channel networks to handle biomedical network-based knowledge graph embedding, SMILES sequence-based notation embedding, and molecular graph-based chemical structure embedding.These channels' output features are then combined through a self-attention mechanism.
As shown in Table 4, on DrugBank dataset, our method is superior to contrast methods.AERGCN-DDI achieves the best performance with a high accuracy of 58.34%, and improved the accuracy by 13.83%, the AUPR by13.88%, the AUC by 0.76% than Suboptimal method.The comparison with other stateof-the-art methods on the dataset 2 further reveals the advantages of our proposed AERGCN-DDI in predicting muti-types DDI events.The evaluation results comprehensively demonstrate the promising performance and broad prospects of AERGCN-DDI.

Ablation study
To validate the effectiveness of using drug fingerprints as node attributes and to verify the efficiency of different components in AERGCN-DDI, including the multi-head attention mechanism and edge propagation module, we performed ablation experiments.The following are the different variants utilized for ablation experiments: AERGCN w/oF P : This is a variant of the AERGCN-DDI model that does not use the node fingerprint feature, but only the topology information in the DDI network.
AERGCN w/oAT : It is the original AERGCN-DDI model without the addition of the multi-head attention component.
AERGCN w/oEP : It is the original AERGCN-DDI model without the addition of the edge propagation component.
According to the analysis in Table 5, AERGCN-DDI performs significantly better than the other variant models on all tasks and assessment metrics.On the contrary, the variant model without the fingerprint feature exhibited the significantly lowest performance.Specifically, AERGCN w/oFP with the molecular fingerprint removed showed the most significant decrease in effectiveness  The experimental results show that drug fingerprints as node properties are most important features of AERGCN-DDI.Drug fingerprints provide rich information about the structure and properties of drug molecules, which helps the model to better understand drug interactions and effects.The experimental results show that the performance of the variant model lacking drug fingerprint features is significantly reduced, further validating the importance of drug fingerprints in the model.
Furthermore, the edge propagation module is one of the key components of the AERGCN-DDI model, which helps the model to better utilize the edge attribute information, including the mode of action and effects of drug combinations.The results of the ablation experiments show that the performance of the variant model with the edge propagation module removed significantly decreases, further confirming the importance of the edge propagation module in the model.
Lastly, the multiple attention mechanism is another key component of the AERGCN-DDI model.This mechanism allows the model to simultaneously focus on different drug interaction features, thus improving the model's ability to capture complex interactions.In the ablation experiments, the performance of the variant model with the multi-head attention mechanism removed decreased in Task 2 and Task 3, indicating that the multi-head attention mechanism plays an important role in enhancing the model performance.
In summary, the drug fingerprint as a node attribute, edge propagation module, and multi-head attention mechanism are key components of the predictive performance of AERGCN-DDI.Their effective integration and utilization enable the AERGCN-DDI model to predict drug-drug interactions more accurately, providing important support for drug development and clinical applications.

Conclusions
In this work, we proposed a novel approach, the AER-GCN-DDI model, which leverages relational graph convolutional networks (RGCN) and multi-head attention mechanisms to predict the specific risk levels associated with drug combinations.Our model utilizes RGCN to comprehend the topological and semantic characteristics of drug nodes, distinguishing between four distinct risk levels and aggregating diverse domain information.Additionally, the incorporation of multiattention mechanisms enhances our model's capability to capture multi-level topology information effectively.In contrast to conventional experimental setups, we conducted experiments tailored to simulate the emergence of new drugs in real-world scenarios, where these drugs have no prior interactions with existing ones.Our DDI prediction task achieved remarkable accuracy rates, with 93.81% for established drugs, 84.93% for newly introduced drugs, and 72.66% when both drugs were novel.This shows that our model exhibits excellent performance in both warm-start and cold-start environments.In addition, we performed cross-dataset validation, especially after using the DrugBank dataset

Fig. 2 Fig. 3
Fig.2The different risk levels of DDI events in DDInter dataset

Fig. 7
Fig. 7 The performance of AERGCN-DDI under different feature dimensions

Table 1
The performance of AERGCN-DDI on Task 1 of the DDInter dataset

Table 2
The performance of AERGCN-DDI on Task 2 of the DDInter datasetBold indicates the method that performs best on this indicator

Table 3
The performance of AERGCN-DDI on Task 3 of the DDInter datasetBold indicates the method that performs best on this indicator

Table 4
The performance of all methods on DrugBank dataset Bold indicates the method that performs best on this indicator

Table 5
The ablation performance of AERGCN-DDI on different tasksBold indicates the method that performs best on this indicator for validation, to further validate the reliability and applicability of our model.Also, we conducted ablation experiments to validate the importance of each component module in the model.The limitation of the model is that the dataset of the proposed model may be biased towards common drug interactions, while the ability to generalize to rare drug interactions is limited.In future work, in order to enhance the applicability and robustness of the AERGCN-DDI model, it is recommended to integrate more drug features such as molecular structure or pharmacokinetics.Also, exploring different graph structures or incorporating temporal information into the model architecture may improve its performance.In addition, applying the model to predict interactions other than drug-drug interactions (DDIs), such as drug-disease interactions or drug-food interactions, could help to extend its application in clinical practice.The proposed AERGCN-DDI model has proved to be an efficient and competitive drug combination risk prediction tool, to aid in medical decision-making, drug development, and disease treatment, yielding better and safer medical interventions and services.